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Abstract 

In J3] a Bayesian Network for analysis of mixed traces of DNA was presented using gamma distributions 
for modelling peak sizes in the electropherogram. It was demonstrated that the analysis was sensitive to 
the choice of a variance factor and hence this should be adapted to any new trace analysed. In the present 
paper we discuss how the variance parameter can be estimated by maximum likelihood. The unknown pro- 
portions of DNA from each contributor can similarly be estimated by maximum likelihood jointly with the 
. variance parameter. Furthermore we discuss how to incorporate prior knowledge about the parameters via a 
j^. \ Bayesian analysis. The proposed estimation methods are illustrated through a few examples of applications 
for calculating evidential value in casework and for mixture deconvolution. 

Keywords: Bayesian network, forensic identification, Markov chain Monte Carlo methods, mixture 
separation 
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1. Introduction 

<N 

\ We consider a model for analysing a mixed trace of DNA using information about peak sizes for each 

present allele, as obtained from the electropherogram for that trace. There is now a substantial body of 
literature on methods for exploiting this information in the analysis and interpretation of DNA mixtures, 
for example J2, H, B B, B EL 111] and a number of articles using probabilistic expert systems and Bayesian 
networks, for example Q9L LlL LlOQ . The present paper follows the paradigm in the latter three articles. 

An important parameter in the analyses based on Bayesian networks was a variance factor in the peak 
size distribution. In a fixed value was used for the variance factor across all markers and all cases, 

although there were signs of sensitivity to the choice of this value. It was therefore suggested in [1] that this 
parameter should be adapted to each case. In the present paper we respond to this suggestion by developing 
methods for simultaneously estimating the variance factor and the unknown mixture proportions for a given 



X 

5-H 



& ' trace. 



2. A Bayesian network for DNA mixture analysis 

The model is implemented as a Bayesian network along the lines described in (J]. Below we summarize 



some of the main features of the model and its use. 
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2. 1. The gamma model for peak sizes 

For each allele present in the mixture the size of the corresponding peak is observed, either represented 
by the peak area or peak height, possibly corrected for preferential amplification. A key assumption is that 
the peak size is roughly proportional to the pre-amplification amount of the corresponding allele [5]. 



We are adopting the gamma model described in [1] and partly justified in [11]. The model assumes a 
known number of contributors, and for technical simplicity we consider here only cases with two contribu- 
tors and do not allow for artefacts such as stutter and drop-out. We also assume that the pre-amplification 
proportions of DNA from the two contributors is constant across markers. We represent the proportion of 
DNA originating from one of the contributors by 9; 9 is then a number between and 1 . 

In 111] it is assumed that, for fixed genotypes of the contributors and a fixed mixture proportion, the peak 
size W a of allele a at a given marker is independent of peak sizes of other alleles and gamma distributed as 

W a ~TWa,ri), (1) 

where 

fi a = {0n\ + (1 - 0)nl} 12 (2) 

and n\ and rv\ denote the number of alleles of type a at a given marker in the genotype of each contributor. 
Thus, for example, if the first contributor has genotype (13,15) and contributed 40% of the DNA, and the 
second contributor has genotype (15, 15), then «} 3 = n} 5 = 1, n^ 5 = 2, and all other n' a -s are equal to zero. 
Hence, in this case 

fx 13 = 0/2 = .20, p 15 = {6 + 2(1 - 6»)}/2 = 1 - 6/2 = .80. (3) 

At each marker the peak sizes (W\ , . . . , Wa) are scaled by their sum such that the resulting relative peak 
sizes (R\, . . . , Ra) add up to 1. We let R denote the total set of observed relative peak sizes for all markers. 

The relative peak sizes are independent between markers and each R a then follows a beta distribution 
with mean and variance given as 

E R a = p a , VarR a = cr 2 p a (\ - p a ). 

where we have let cr = 1 / ^{3 + 1. Hence p a is the mean (relative) peak size for allele a so, for example, in 
the mixture (0 above we would expect the peak at allele 15 to be about four times as large as that at 13. 
Also, cr is a measure of the generic peak imbalance: For a single heterozygous contributor with allele a we 
have p a - 1 /2 and therefore expect two peaks of same size; the coefficient of variation for one such peak 
being 



VR a I 



= cr, 



i.e. if cr = 0.07, say, the standard deviation of such a relative peak area is 7%. 

The parameter yS is related to the heterozygote balance (Hb) as described in ||4j], i.e. the ratio between the 
peak sizes for the two alleles. The gamma model implies that Hb is F(/?, /^-distributed. For a case where 
cr = 0.07 we get /3 = 203.08 and a 95%_prediction interval for Hb would be 0.759 < Hb < 1.318 which 
conforms well with previous findings |0,la0]. 

2.2. DNA mixture analysis 

Based on the relative peak areas and the Bayesian network, two key questions can be addressed: a 
mixture deconvolution which attempts to determine the DNA profiles of the unknown contributors to the 
mixture, and the calculation of an evidential value for the comparison of specific hypotheses concerning the 
composition of the observed mixture. 
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Mixture deconvolution. The DNA profiles of the contributors to a mixture can be predicted by a ranked 
list of probable profile pairs based on the information in the peak sizes, i.e. ranking the genotypes (c\,C2) 
according to their probabilities p{c\,C2 I R, 9, &). Note that both 9 and cr are unknown and therefore need to 
be estimated. 

Evidential value. Suppose we have a reference profile from an individual which we shall term the suspect 
and wish to compare two specific hypotheses H p and Hj, entertained by the prosecution and defence, for 
example 

Hp. "The suspect and one unknown individual has contributed to the trace" 
Hd'- "Two unknown individuals have contributed to the trace". 

We consider contributors to be unrelated and the unknown individuals drawn at random from a specific 
population. To assess the strength of the evidence we wish to calculate the likelihood ratio LR of H p against 
H d : 

p(R\H p ,6,cr) 

LR = 1 , (4) 

p{R\H d ,6,o-y 

where we again note the dependency of this ratio on the unknown parameters 9 and cr. 
2.3. Data and software 

We illustrate the methods using relative peak sizes from two mixtures with partial or complete knowl- 
edge of the contributors also used in [1], denoted the Evett |2[] and Perlin [3] data respectively. The peak 
sizes are adjusted for preferential amplification by scaling the areas by the repeat number for the corre- 
sponding allele. The Evett data (Tabled)) consists of the relative peak sizes from a mixture in 10:1 ratio with 
a known profile for the main contributor. The Perlin data (Table [2]) are from a 7:3 ratio mixture with two 
known contributors. 

We follow [1] and use allele frequencies for the US Caucasian population as given in 1 12]. One of the 



observed alleles, allele 25.2 at marker FGA, found in the Perlin dataset was not present in the database, so 
the two known profiles under study were added to the database and allele frequencies updated accordingly. 
We have used the software R 111 3fl and HUGIN jl4ll for calculations in the examples. Through the R- 



package RHugin B15I1 it has been possible to perform all computations from within R and hence take direct 
advantage of the statistical tools available in R as well as those provided by HUGIN for efficient computation 
in Bayesian networks. 



3. Methods for parameter estimation 

We now turn to the problem of estimating the unknown quantities cr and 9. We discuss three methods 
for doing so. 

(i) In the first method we proceed as in |[l|] and include 9 in discretised form directly as a node in the 
Bayesian network with a uniform distribution. Instead of fixing a value cr in advance we estimate cr 
by the method of maximum likelihood based on the case data at hand; 

(ii) The second method treats also 9 as a fixed and unknown parameter and then estimate both cr and 9 by 
maximum likelihood; 

(iii) A third approach exploits prior information on both cr and 9 to perform a fully Bayesian analysis using 
Markov chain Monte Carlo methods flol . 
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Table 1: Evett data. The person with DNA profile ci is the 
major contributor. The profile for the minor contributor is 
unknown. 



Marker 


Allele 


R 


Cl 




i n 
LU 


0.434 / 






1 1 


0.0285 






14 


0.5368 


14 


D18 


13 


0.8871 


13 




16 


0.0536 






17 


0.0592 




D21 


59 


0.0525 






65 


0.0676 






67 


0.4284 


67 




70 


0.4515 


70 


FGA 


21 


0.5699 


21 




22 


0.3908 


22 




23 


0.0393 




TH01 


8 


0.4015 


8 




9.3 


0.5985 


9.3 


VWA 


16 


0.4170 


16 




17 


0.0884 






18 


0.4747 


18 




19 


0.0199 





Table 2: Perlin data. The person with DNA profile ci is the 
major contributor. 



Marker 


Allele 


R 


Cl 




D2 


16 


0.1339 




16 




18 


0.2992 


18 






20 


0.1947 




20 




21 


0.3722 


21 




D3 


14 


0.5010 


14 


14 




15 


0.4990 


15 


15 


D8 


9 


0.2832 


9 






12 


0.1426 




12 




13 


0.3829 


13 






14 


0.1913 




14 


D16 


11 


0.6801 


11 






13 


0.1607 




13 




14 


0.1593 




14 


D18 


12 


0.1504 




12 




13 


0.3290 


13 






14 


0.3443 


14 






17 


0.1764 




17 


D19 


12.2 


0.3109 


12.2 






14 


0.3092 




14 




15 


0.3799 


15 




D21 


27 


0.1289 




27 




29 


0.3913 


29 






30 


0.4798 


30 


30 


FGA 


19 


0.4621 


19 


19 




24 


0.1561 




24 




25.2 


0.3817 


25.2 




TH01 


6 


0.1268 




6 




7 


0.4691 


7 


7 




9 


0.4041 


9 




VWA 


17 


0.7265 


17 






18 


0.2735 




18 



3.1. Maximum likelihood estimation of a 

The likelihood function for cr is obtained by averaging out over all possible compositions (c\,C2,9) of 
the mixture: 

L(o-) = p(R\H,o-)= ^ \y\p{R m \c'lc™,e,a)\p{cuC 2 ,e\H) 

c\,C2,9 v m ) 

where R m , c'" and cT are relative peak sizes and genotypes for each marker m and H denotes a specific 
hypothesis under consideration. 

Direct computation of the likelihood function using this expression is not feasible as the number of 
possible mixture compositions (c\,ci,6) typically is astronomical. However, the exact value of L(o~) can 
be obtained as the normalising constant from propagation of likelihood evidence in the Bayesian network 
which is what we have used here. We omit the technical details. 

Figure [T] shows the likelihood function and its logarithm for the Perlin data when considering both 
contributors unknown. The likelihood function can for example be maximised using a general numeric 
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Figure 1 : The likelihood function L(<r) and its logarithm {(cr) = log L(a) for the Perlin data and a scenario of two unknown 
contributors. 

algorithm for maximising a real function. The likelihood function for the Perlin data has a maximum at 
cr = 0.0722, indicating a peak imbalance about 7%. The shape of tier) around its maximum indicates that 
the uncertainty of the MLE can reasonably be based on asymptotic normality using the second derivative of 
the log-likelihood function as 



This quantity can again be found by numerical derivation; combined with using the normalising constant 
from propagation in the Bayesian network for exact computation of €, this is an extremely fast method. 
Using this method for the Perlin data we obtain a 99% confidence interval for cr of (0.0441,0.1003). In 
comparison, QJJ] used a value of cr 2 = 0.01 corresponding to cr = 0.1, which is just inside the confidence 
interval calculated. 

3.2. Maximum likelihood estimation of a and 9 

In contrast to the previous section we now also consider 9 as a parameter and thus estimate both 9 and 
cr by maximising the likelihood function 



To obtain the last equality we have used that when both of 9 and cr are fixed, the genotypes and peak 
sizes are all independent between markers. The internal sums in the last expression can be calculated as 
they stand, as each only involves genotypes at a single marker. Alternatively, L{9, cr) can also here be found 
from the normalising constant from propagation of the likelihood evidence. 

The asymptotic covariance matrix for the estimates is obtained from the second derivatives of the log- 
likelihood function as before. Again we have maximised the likelihood function and found its derivatives 
by numerical methods. 

In the left-hand panel of Figure [2] we see the likelihood function for the Perlin data obtained in the 
case with 2 unknown contributors. Unsurprisingly, the likelihood is symmetrical around 9 = 0.5, because 



Var(o-) « -lie" (a). 
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the labelling of contributors is arbitrary. The right-hand panel of Figure |2] shows the likelihood function 
when the DNA profiles of both contributors are specified; the likelihood function picks up which of the 
two contributors is the major contributor and again correctly estimates the proportion of DNA from this 
contributor to be around 0.7. 




Figure 2: The likelihood function L(cr,6) = p(R;cr,6) for the Perlin data with two unknown contributors (left). To the right the 
likelihood function after specifying the DNA profiles for two contributors. 

The maximum likelihood estimates for cr and 9 are displayed in Tabled The estimates cr and 9 are close 
to being independent with asymptotic correlations in the three situations for the Perlin data being -0.1947, 
-0.0422, and -0.0422. For the Evett data it is -0.1599 in both situations. For both data sets the estimated 
mixture proportions 9 are remarkably close to the proportions used for constructing the DNA mixture. In 
contrast to the model using a uniformly distributed 9, the Perlin data does not quite support the use of 
cr = 0. 1 although it is not far off. 

For the Perlin data, if we include genotypes of the minor contributor as a potential contributor we get 
better estimates of the parameters which is reflected in the narrower confidence intervals. When further 
including the DNA profiles of both contributors as known, the estimates do not change at all. For the Evett 
dataset, specifying genetic information on a potential contributor barely changes the estimates. 

For the Perlin data, where cr « 0.07 a 95% prediction interval for the heterozygote balance Hb is 
0.759 < Hb < 1.318. For the Evett case the generic peak imbalance is a bit higher, resulting in a slightly 
wider range of expected heterozygote balance, 0.687 < Hb < 1.456. Note that for both the Perlin and the 
Evett data the model leads to heterozygote balances that comply with the recommendation in |4|]. 

3.3. Including prior information about cr and 9 

In Section l3~T1 it was seen that the DNA mixture can be modelled conditionally on the observed relative 
peak sizes for a fixed cr and a uniform distribution for 9. We now explain how to combine this model with 
prior information about the variability on cr to perform Bayesian inference in the model. 

It is possible to simulate from p(c\,C2, cr,9\H, R), for example by using a Gibbs sampler which alter- 
nates between 

1. sampling a pair (c\, ci, 9) of complete configurations of genotypes and mixture proportion given the 
current value of cr and observed relative peak sizes; 
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Table 3: Joint maximum likelihood estimates of the mixture ratio and peak imbalance. The estimates of 9 reflect the mixture ratios 
used for constructing the data; a 7:3 ratio for the Perlin data, and a 10: 1 ratio for the Evett data. 



Perlin data 



Genotype information & 99% CI 9 99% CI 



Both contributors unknown 
One known potential contributor 
Both contributors known 


0.0701 
0.0674 
0.0674 


(0.0402, 0.1000) 
(0.0408, 0.0940) 
(0.0408, 0.0940) 


0.6921 
0.6956 
0.6956 


(0.6576, 0.7267) 
(0.6665, 0.7247) 
(0.6665, 0.7247) 


Genotype information 


Evett data 
& 99% CI 





99% CI 


Both contributors unknown 
One known potential contributor 


0.0956 
0.0956 


(0.0444, 0.1468) 
(0.0444, 0.1468) 


0.8948 
0.8948 


(0.8579, 0.9317) 
(0.8579, 0.9317) 



2. sampling a given the pair of DNA profiles sampled in the above step, the sampled mixture proportion, 
and the observed relative peak sizes. 

The first step is performed by sampling from the Bayesian network model of the DNA mixture after 
including likelihood evidence using o~ and R. For the second step, o~ can be sampled by standard methods 
for univariate sampling. In particular, provided that the prior distribution on /3 = 1 jo 2 - 1 is log-concave 
(for instance a gamma distribution), the distribution of ft for a known composition of the DNA mixture is in 
turn log-concave, which means that we can use adaptive rejection sampling 111711 for this step. 



4. Case analysis 

We now illustrate the use of the different estimation methods for a case analysis. In the full Bayesian 
analysis, the uncertainty about the parameters is represented by their posterior distribution. For a full spec- 
ification of the Bayesian model we have used a uniform prior distribution for 9 and a Gamma distribution 
for yS = l/cr 2 -l with parameters T(l, 1/100) throughout. In a specific application it would be appropriate 
to use previous experience for choosing a suitable prior distribution. Our choice is for illustration only. 

4.1. Evidence calculation 

Evett data. For the Evett data we use the known major contributor as a potential suspect and compare the 
hypotheses as in (01). Using the fitted parameters from Table [3] we find 

p(R\H p ,&, 9) 

log 10 L/? = log 10 p ^ - 8.534143. 

610 p{R\H d ,&,9) 

We have here used the MLE for the situation where the known profile is specified to be a potential con- 
tributor. Alternatively one could use a different MLE in numerator and denominator, corresponding to the 
different hypotheses considered. 

As the likelihood ratio is calculated using the parameter estimates, we can assess the uncertainty of 
the estimate of log 10 LR by parametric bootstrap: using the fitted parameters and the fact that relative sizes 
are Dirichlet distributed, we simulate 2000 new sets of relative peak sizes and estimate the parameters for 
each of these. A 99% bootstrap confidence interval for the log 10 L7? is then (8.533970, 8.534143). Note 
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that this is very narrow, indicating that log 10 LR is very accurately determined despite the uncertainty in cr 
and 9. Histograms of the bootstrap samples of parameter estimates and log 10 LR- values are displayed in 
Figure [3] The shape of the histograms indicate that the distribution of the estimates are well approximated 




0.04 0.08 0.12 0.16 0.84 0.86 0.88 0.90 0.92 0.94 8.520 8.525 8.530 

o 8 log, LR(<j,B) 



Figure 3: Bootstrap simulations of <r, 8 and log 10 LR for the Evett data, based on 2000 samples. The dashed lines indicate the 
values as estimated on the Evett data set. 

by a Gaussian distribution. Note the very concentrated histogram for log 10 LR. 

For the Bayesian analysis, the quantity of interest is the ratio of marginal likelihoods 

p(R\H d ) 

with both 9 and cr integrated out; the numerator and denominator can be estimated from the Monte Carlo 
samples as 

1 N 

p(R\H p )*-Yj P (R\H p ,o-i) 

i=i 

and similarly for p(R \ Hj). This yields a log 10 LR of 8.233, somewhat smaller than the value obtained by 
using maximum likelihood, but still representing overwhelming evidence that the suspect has contributed 
to the mixture. 

Perlin data. For the Perlin data, we use the known minor contributor as a potential contributor, and consider 
the likelihood ratio for H p against Hj resulting in log 10 ZJ? = 14.942 using the joint maximum likelihood 
estimate (9, &) with a 99% bootstrap confidence interval of (13.328, 15.075), i.e. a considerably wider 
interval than for the Evett data. The Monte Carlo estimate for \og m LR based on the marginal likelihood 
ratio is 14.483. We have displayed histograms for bootstrapped parameter estimates and log-likelihood 
ratios in Figure |4] 

Again, the histograms indicate that the distribution of the estimates is well approximated by a Gaussian 
distribution. Here log 10 L/? is more variable, indicating higher sensitivity to parameter uncertainty than for 
the Evett data. Still, all values of log 10 LR in the confidence interval provide evidence that the potential 
contributor indeed did contribute to the mixture. 

4.2. Mixture deconvolution 

We produce a ranked list of probably profile pairs using the following trick, exploiting the fact that 
sampling a set of DNA profiles for the contributors is straightforward regardless of the choice of estimation 
method. We sample profile pairs {c\,C2) from a DNA mixture model using the observed relative peak 
sizes and the estimation method of preference. For each of the sampled profile pairs we then calculate the 
probability of that particular pair, p{{c\, C2) I R}. Adding up these probabilities for all the sampled profile 
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Figure 4: Bootstrap simulations of <r, 6 and corresponding log 10 LR(cr, 9) for the Perlin data. The dashed lines indicate the values 
estimated from the Perlin data using the minor contributor as a potential contributor. 



pairs we get that they account for some total p of the probability mass, implying that no undiscovered pair 
of profiles can have probability larger than I— p. Thus, if k of the sampled profile-pairs all have probability 
larger than 1 - p they must constitute the k most probable profiles. Note that the number k of most probable 
profiles is not fixed in advance, and that increasing the number of samples will result in a longer list of 
profiles. We illustrate this method using the Perlin data. 

If the model is fitted using maximum likelihood estimates for cr and 9, we can sample possible profiles 
for the two contributors using the Bayesian network and thereby obtain pairs of profiles of high probability. 
The sampling revealed 13 different profiles with a total probability of p = 0.9992. There were eight of 
these profiles with a probability larger than 1 - p = 0.0008, implying that the k = 8 most probable DNA 
profile pairs had been determined. For seven of the markers the genotypes were correctly identified; for the 
marker TH01 there is slight uncertainty whether the minor contributor supplied the allele 7 or 9; similarly, 
for markers D19 and VWA there is slight uncertainty about the allocation of alleles to the contributors. 
Table |4] displays the eight possible choices for the remaining three markers. 

Table 4: The eight most probable contributor genotypes of the three uncertain markers for the Perlin data using the MLE for cr and 
6. Correctly predicted genotypes are marked in bold. 



Major contributor Minor contributor 



rank 


D19 


TH01 


VWA 


D19 


TH01 


VWA 


Prob. 


1 


12.2 15 


79 


17 17 


14 14 


67 


18 18 


0.647 


2 


12.2 15 


79 


17 18 


14 14 


67 


17 17 


0.261 


3 


12.2 14 


79 


17 17 


15 15 


67 


18 18 


0.054 


4 


12.2 14 


79 


17 18 


15 15 


67 


17 17 


0.022 


5 


14 15 


79 


17 17 


12.2 12.2 


67 


18 18 


0.006 


6 


12.2 15 


79 


17 17 


14 14 


69 


18 18 


0.004 


7 


14 15 


79 


17 18 


12.2 12.2 


67 


17 17 


0.002 


8 


12.2 15 


79 


17 18 


14 14 


69 


17 17 


0.001 



Total probability 0.997 



We note that for the Perlin data the most probable profile pair is the true one and the second most 
probable pair has a misclassification on only one marker, VWA. As for the analysis of evidential value, it is 
possible to assess the uncertainty of these rankings and the sensitivity to the choice of parameters, e.g. by 
bootstrap. We shall omit such further analysis here. 
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For the Bayesian analysis we use the Gibbs sampler to locate high probability pairs of profiles. In this 
case we obtain 24 different profiles for each contributor. Subsequently we again use the Gibbs sampler to 
obtain the posterior probability p{(c\, c 2 ) I R} for each pair as 

1 N 

p{(c u c 2 ) I R] « - 2 Pi(cuc 2 ) I R, 07}. 

This yields a total probability p = 0.994 for the 24 profiles and identifies seven profiles with a probability 
larger than the resulting threshold l—p = 0.006. Also in the Bayesian analysis all genotypes were correctly 
identified for seven of the markers. The genotypes of the profile pairs for the remaining three markers are 
displayed in Table [5] 

Table 5: The seven most probable contributor genotypes of the three uncertain markers for the Perlin data in a fully Bayesian 
analysis. 

Major contributor Minor contributor 



rank 


D19 


TH01 


VWA 


D19 


TH01 


VWA 


Prob. 


1 


12.2 15 


79 


17 17 


14 14 


67 


18 18 


0.511 


2 


12.2 15 


79 


17 18 


14 14 


67 


17 17 


0.315 


3 


12.2 14 


79 


17 18 


15 15 


67 


17 17 


0.068 


4 


12.2 14 


79 


17 17 


15 15 


67 


18 18 


0.056 


5 


12.2 15 


79 


17 17 


14 14 


69 


18 18 


0.008 


6 


12.2 15 


79 


17 18 


14 14 


69 


17 17 


0.007 


7 


12.2 15 


79 


17 17 


14 14 


67 


17 18 


0.007 



Total probability 0.972 



The Bayesian analysis reveals the same two profile pairs as the most probable but slightly reverses the 
ranking of profile pairs with smaller probabilities. Note, though, that the probabilities in Table [5] are Monte 
Carlo estimates with a standard error about 2% and thus the mutual ranking of profiles with very similar 
probabilities is uncertain. In other words, the four most highly ranked configurations are definitely the top 
four on the list, whereas the mutual ranking of number three and four could be reversed, although this is 
unlikely. Similarly, the correct mutual ranking of number five, six, and seven could easily be reversed in 
comparison with the ranking in the table. 

5. Discussion 

In the present article we have demonstrated how both the mixture proportions and the unknown variance 
factor in the model used by I lj] can be estimated and its uncertainty incorporated into further analysis of the 
DNA trace. The analysis shows that there is sufficient information in a single trace to do so using only the 
peak sizes for the data at hand, if the variance factor o~ is taken to be marker independent. 

In this paper we have illustrated how it is possible to assess the performance of a particular method 
for a given case. It would be well worthwhile to carry out a further study on the general performance, for 
example of the stability of the method for mixture deconvolution. However, we would like to emphasise 
that by performing a bootstrap analysis we get an indication of the information that data from mixtures of 
similar composition would provide about the questions in mind and the bootstrap analyses do indicate a 
considerable stability of the findings. 
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There might be good reasons to believe that the peak imbalance cr differs across markers. As there 
is limited information in the data from a single case we have for practical reasons chosen to ignore this 
variation and use a single cr for each case. An alternative way of accommodating marker dependence on the 
generic variability would be to assume that the parameter /? = 1/cr 2 - 1 in the gamma model £T|) depends on 
rnasfim = AS m where A depends only on the case at hand and S m is marker dependent but independent of the 
case considered. One could then use laboratory data to estimate 5 m and only adapt A to the case considered. 
This methodology would only demand minor technical variations for the Bayesian and maximum likelihood 
methods developed in this paper. Generally, prior information on the total amount of DNA would also be 
available which could be used to improve the Bayesian analysis by using an informative prior distribution 
for A. 

We have only considered the simplest model with a fixed number of contributors and no artefacts in the 
form of stutter, dropout, etc. Maximum likelihood estimates for the parameters can still be found extending 
to the case of multiple contributors, but as there would be more mixture proportions to estimate, larger 
confidence intervals for the parameter estimates are to be expected. Models for artefacts [18] involve even 
more parameters which need to be estimated in a similar way and it certainly adds to the general complexity 
of the problem, as does issues of gene frequency uncertainties etc. 11911 . We expect to address all these 
issues in the future. 
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